11 research outputs found

    Use of Mel Frequency Cepstral Coefficients for Automatic Pathology Detection on Sustained Vowel Phonations: Mathematical and Statistical Justification

    Get PDF
    This paper presents a justification for the use of MFCC parameters in automatic pathology detection on speech. While such an application has produced good results up to now, only partial explanations to this good performance had been given before. The herein exposed explanation consists of an interpretation of the mathematical transformations involved in MFCC calculation and a statistical analysis that confirms the conclusions drawn from the theoretical reasoning

    Use of Cepstrum-based parameters for automatic pathology detection on speech. Analysis of performance and theoretical justification

    Get PDF
    The majority of speech signal analysis procedures for automatic pathology detection mostly rely on parameters extracted from time-domain processing. Moreover, calculation of these parameters often requires prior pitch period estimation; therefore, their validity heavily depends on the robustness of pitch detection. Within this paper, an alternative approach based on cepstral-domain processing is presented which has the advantage of not requiring pitch estimation, thus providing a gain in both simplicity and robustness. While the proposed scheme is similar to solutions based on Mel-frequency cepstral parameters, already present in literature, it has an easier physical interpretation while achieving similar performance standards

    Detección del espacio glotal en imágenes laríngeas mediante transformada Watershed y Merging JND

    Get PDF
    El presente artículo describe un nuevo método para la detección del espacio glotal en imágenes laríngeas obtenidas de vídeos de alta o baja velocidad. El proceso de detección basa su eficacia en la combinación de varias técnicas de gran relevancia en el campo del tratamiento digital de imágenes. Una de estas técnicas es la transformada Watershed que junto con varios tipos de Merging y un proceso final de predicción lineal, hacen posible la detección automática en un 99% de las imágenes analizadas. La potencia del método se ve incrementada por la ausencia de cualquier tipo de inicialización y por no necesitar condiciones estrictas sobre las características de las imágenes a procesar. Evidentemente es importante que el algoritmo integre información a priori del espacio glotal, pero este conocimiento es bastante relajado comparado con las condiciones impuestas por otros trabajos que también intentan la segmentación

    Screening voice disorders with the glottal to noise excitation ratio

    Get PDF
    This work evaluates the capabilities of the Glottal to Noise Excitation Ratio for the screening of voice disorders. A lot of effort has been made using this parameter to evaluate voice quality, but there do not exist studies that evaluate the discrimination capabilities of this acoustic parameter to classify between normal and pathological voices. A set of 226 speakers (53 normal and 173 pathological) taken from a voice disorders database were used to evaluate the usefulness of this parameter for discriminating normal and pathological voices. In order to evaluate this parameter, the effect of the bandwidth of the Hilbert envelopes and the frequency shift have been analyzed, concluding that a good discrimination is obtained with a bandwidth of 1000 Hz and a frequency shift of 300 Hz. The results confirm that the Glottal to Noise Excitation Ratio provides reliable measurements in terms of discrimination among normal and pathological voices, comparable to other classical long-term noise measurements found in the literature, such as Normalized Noise Energy or Harmonics to Noise Ratio, so this parameter is a good candidate to be used for screening purposes

    Effects of audio compression in automatic detection of voice pathologies

    Get PDF
    This paper investigates the performance of an automatic system for voice pathology detection when the voice samples have been compressed in MP3 format and different binary rates (160, 96, 64, 48, 24, and 8 kb/s). The detectors employ cepstral and noise measurements, along with their derivatives, to characterize the voice signals. The classification is performed using Gaussian mixtures models and support vector machines. The results between the different proposed detectors are compared by means of detector error tradeoff (DET) and receiver operating characteristic (ROC) curves, concluding that there are no significant differences in the performance of the detector when the binary rates of the compressed data are above 64 kb/s. This has useful applications in telemedicine, reducing the storage space of voice recordings or transmitting them over narrow-band communications channels

    Feature selection in pathological voice classification using dinamyc of component analysis

    Get PDF
    This paper presents a methodology for the reduction of the training space based on the analysis of the variation of the linear components of the acoustic features. The methodology is applied to the automatic detection of voice disorders by means of stochastic dynamic models. The acoustic features used to model the speech are: MFCC, HNR, GNE, NNE and the energy envelopes. The feature extraction is carried out by means of PCA, and classification is done using discrete and continuous HMMs. The results showed a direct relationship between the principal directions (feature weights) and the classification performance. The dynamic feature analysis by means of PCA reduces the dimension of the original feature space while the topological complexity of the dynamic classifier remains unchanged. The experiments were tested with Kay Elemetrics (DB1) and UPM (DB2) databases. Results showed 91% of accuracy with 30% of computational cost reduction for DB1

    Acoustic analysis of voice using WPCVox: a comparative study with Multi Dimensional Voice Program

    Full text link
    In this study, two diVerent tools developed for the parametric extraction and acoustic analysis of voice samples are compared. The main goal of the paper is to contrast the results obtained using the classical Multi Dimensional Voice Program (MDVP), with the results obtained with the novel WPCVox. The aim of this comparison was to Wnd diVerences and similarities in the parameters extracted with both systems in order to make comparison of measurements and data transfer among both equipments. The study was carried out in two stages: in the Wrst, a wide sample of healthy voices belonging to Spanishspeaking adults from both genders were used to carry out a direct comparison between the results given by MDVP and those obtained with WPCVox. In the second stage, a sample of 200 speakers (53 normal and 173 pathological) taken from a commercially available database of voice disorders were used to demonstrate the usefulness of WPCVox for the acoustic analysis and the characterization of normal and pathological voices. The results conclude that WPCVox provides very reliable measurements which are very similar to those obtained using MDVP, and very similar capabilities to discriminate among normal and pathological voices

    Preprocesado Avanzado de Imágenes Laríngeas para Mejorar la Segmentación del Área Glotal

    Get PDF
    El presente trabajo describe un método avanzado de preprocesado de imagen para mejorar la detección automática del espacio glotal en imagines laríngeas. El sistema puede aplicarse a imágenes obtenidas a partir de exploraciones de alta velocidad o a partir de exploraciones estroboscópicas (baja velocidad), aunque es en estas últimas donde se observan las mayores ventajas, al tratarse de grabaciones de inferior calidad. Con esta nueva técnica de preprocesado se logran resolver ciertos fallos de segmentación producidos por un sistema previo basado en transformada “Watershed” y “Merging”. En resumen, se consiguen arreglar o mejorar el 38% de los errores de delineado de la glotis que aparecían en 29 imágenes de un total de 111 segmentadas

    MedivozCaptura. Una aplicación en red segura de ayuda al profesional de ORL

    Get PDF
    MedivozCaptura es una herramienta informática desarrollada para asistir al análisis y detección de patologías vocales. Se basa en el almacenamiento en una base de datos relacional de señales de voz, electroglotogramas (EGG) y vídeoendoscopias, además de otros datos sobre los pacientes que los especialistas puedan considerar relevantes. El presente documento describe el funcionamiento de la aplicación de forma distribuida en red, con la base de datos centralizada, así como la problemática de seguridad y rendimiento que supone la distribución a través de la red o Internet y cómo se solventa en MedivozCaptur

    Extracción de la curva de tono de un cantante mediante combinación de algoritmos clásicos

    No full text
    Los algoritmos de extracción automática de la curva de tono presentan errores que, a menudo, deben corregirse con un procesado posterior manual o automático. En esta comunicación se presenta un sistema de extracción de la curva de tono utilizando el resultado de seis algoritmos clásicos que se combinan para evitar errores y proporcionar los valores más fiables de los seis que se han calculado. El algoritmo se ha probado para extraer la curva de tono de un cantante interpretando una canción y los resultados son satisfactorios. Abstract: The algorithms of automatic extraction of the fundamental frequency contour often present errors that must be corrected with a subsequent manual or automatic processing. In this communication we present a system for extracting the fundamental frequency contour using the results of six classical algorithms that are combined to avoid errors. The algorithm has been proven to extract the fundamental frequency contour of a singer performing a song and results are satisfactory
    corecore